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Single-index models are natural extensions of linear models and cir- 
cumvent the so-called curse of dimensionality. They are becoming in- 
creasingly popular in many scientific fields including biostatistics, medi- 
cine, economics and financial econometrics. Estimating and testing the 
model index coefficients (3 is one of the most important objectives in the 
statistical analysis. However, the commonly used assumption on the in- 
dex coefficients, ||/3|| = f , represents a nonregular problem: the true index 
is on the boundary of the unit ball. In this paper we introduce the EFM 
approach, a method of estimating functions, to study the single-index 
model. The procedure is to first relax the equality constraint to one with 
(d— 1) components of /3 lying in an open unit ball, and then to construct 
the associated (d — 1) estimating functions by projecting the score func- 
tion to the linear space spanned by the residuals with the unknown link 
being estimated by kernel estimating functions. The root-n consistency 
and asymptotic normality for the estimator obtained from solving the 
resulting estimating equations are achieved, and a Wilks type theorem 
for testing the index is demonstrated. A noticeable result we obtain is 
that our estimator for f3 has smaller or equal limiting variance than the 
estimator of Carroll et al. [J. Amer. Statist. Assoc. 92 (1997) 447-489]. 
A fixed-point iterative scheme for computing this estimator is proposed. 
This algorithm only involves one-dimensional nonparametric smoothers, 
thereby avoiding the data sparsity problem caused by high model di- 
mensionality. Numerical studies based on simulation and on applications 
suggest that this new estimating system is quite powerful and easy to 
implement. 
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1. Introduction. Single-index models combine flexibility of modeling with 
interpretability of (linear) coefficients. They circumvent the curse of dimen- 
sionality and are becoming increasingly popular in many scientific fields. 
The reduction of dimension is achieved by assuming the link function to 
be a univariate function applied to the projection of explanatory covariate 
vector on to some direction. In this paper we consider an extension of single- 
index models where, instead of a distributional assumption, assumptions of 
only the mean function and variance function of the response are made. Let 
(YJ,Xj), i= 1, . . . , n, denote the observed values with Yi being the response 
variable and Xj as the vector of d explanatory variables. The relationship 
of the mean and variance of Yi is specified as follows: 

(1.1) E(Y\X t ) = i2{g((3 r Xi)}, Var^lX) = a 2 V{g((3 r Xi)}, 

where jjL is a known monotonic function, V is a known covariance function, 
g is an unknown univariate link function and /3 is an unknown index vector 
which belongs to the parameter space G = {(3 = (fa, . . . , fa) T : ||/3|| = 1, fa > 
0,/3 G M. d }. Here we assume the parameter space is G rather than the entire 
M d in order to ensure that (3 in the representation (1.1) can be uniquely 
defined. This is a commonly used assumption on the index parameter [see 
Carroll et al. (1997), Zhu and Xue (2006), Lin and Kulasekera (2007)]. An- 
other reparameterization is to let fa = 1 for the sign identifiability and to 
transform (3 to (1, fa,..., fa) /(I + Ylt=2 Pr Y^ 2 f° r the scale identifiability. 
Clearly (l,fa, . . . ,fa)/(l + J2 r= 2 P 2 ) 1 ^ 2 can a l so span the parameter space 
G by simply checking that fa, . . . ,/3 d )/(l + Er=2#r) 1/2 |l = 1 and the 
first component 1/(1 + Ylr=2 Pr) 1 ^ 2 > 0- However, the fixed-point algorithm 
recommended in this paper for normalized vectors may not be suitable for 
such a reparameterization. Model (1.1) is flexible enough to cover a vari- 
ety of situations. If fi is the identity function and V is equal to constant 1, 
(1.1) reduces to a single- index model Hardle, Hall and Ichimura (1993). 
Model (1.1) is an extension of the generalized linear model McCullagh and 
Nelder (1989) and the single- index model. When the conditional distribution 
of Y is logistic, then fj,{g((3 T X)} = exp{#(/3 T X)}/[l + exp{g((3 T X)}] and 
V{g((3 T X)} =exp{<7(/3 T X)}/[l + exp{ ff (/3 T X)}] 2 . 

For single-index models: /i{g(/3 T X)} = ^(/3 T X) and F{<7(/3 T X)} = 1, var- 
ious strategies for estimating j3 have been proposed in the last decades. Two 
most popular methods are the average derivative method (ADE) introduced 
in Powell, Stock and Stoker (1989) and Hardle and Stoker (1989), and the si- 
multaneous minimization method of Hardle, Hall and Ichimura (1993). Next 
we will review these two methods in short. The ADE method is based on 
that dE(Y\lL = x)/dx = g'(p T x)f3 which implies that the gradient of the 
regression function is proportional to the index parameter (3. Then a natural 
estimator for (3 is p = n" 1 YJU VG^O/IK 1 Y%=\ ^5( X *)H with VG ( X ) 
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denoting dE(Y\X = x)/cbc and || • || being the Euclidean norm. An advantage 
of the ADE approach is that it allows estimating (3 directly. However, the 
high-dimensional kernel smoothing used for computing VG(x) suffers from 
the "curse of dimensionality" if the model dimension d is large. Hristache, 
Juditski and Spokoiny (2001) improved the ADE approach by lowering the 
dimension of the kernel gradually. The method of Hardle, Hall and Ichimura 
(1993) is carried out by minimizing a least squares criterion based on non- 
parametric estimation of the link g with respect to (3 and bandwidth h. 
However, the minimization is difficult to implement since it depends on an 
optimization problem in a high-dimensional space. Xia et al. (2002) proposed 
to minimize average conditional variance (MAVE). Because the kernel used 
for computing (3 is a function of ||X,, — Xj||, MAVE meets the problem of 
data sparseness. All the above estimators are consistent under some reg- 
ular conditions. Asymptotic efficiency comparisons of the above methods 
have been discussed in Xia (2006) resulting in the MAVE estimator of (3 
having the same limiting variance as the estimators of Hardle, Hall and 
Ichimura (1993), and claiming alternative versions of the ADE method hav- 
ing larger variance. In addition, Yu and Ruppert (2002) fitted the partially 
linear single-index models using a penalized spline method. Huh and Park 
(2002) used the local polynomial method to fit the unknown function in 
single-index models. Other dimension reduction methods that were recently 
developed in the literature are sliced inverse regression, partial least squares 
and canonical correlation method. These methods handle high-dimensional 
predictors; see Zhu and Zhu (2009a, 2009b) and Zhou and He (2008). 

The main challenges of estimation in the semiparametric model (1.1) are 
that the support of the infinite-dimensional nuisance parameter g(-) de- 
pends on the finite-dimensional parameter f3, and the parameter (3 is on 
the boundary of a unit ball. For estimating (3 the former challenge forces 
us to deal with the infinite-dimensional nuisance parameter g. The latter 
one represents a nonregular problem. The classic assumptions about asymp- 
totic properties of the estimates for (3 are not valid. In addition, as a model 
proposed for dimension reduction, the dimension d may be very high and 
one often meets the problem of computation. To attack the above problems, 
in this paper we will develop an estimating function method (EFM) and 
then introduce a computational algorithm to solve the equations based on 
a fixed-point iterative scheme. We first choose an identifiable parameteri- 
zation which transforms the boundary of a unit ball in M. d to the interior 
of a unit ball in By eliminating j3±, the parameter space O can be 

rearranged to a form {((1 - £|L 2 ff) 1/2 ,fc, ■ ■ ■ , /?d) T : Er=2 P? < 1}- Then 
the derivatives of a function with respect to (fa, • • • , /?d) T are readily ob- 
tained by the chain rule and the classical assumptions on the asymptotic 
normality hold after transformation. The estimating functions (equations) 
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for (3 can be constructed by replacing g(/3 X) with g((3 X). The estimate 
g for the nuisance parameter g is obtained using kernel estimating functions 
and the smoothing parameter h is selected using K-io\d cross-validation. 
For the problem of testing the index, we establish a quasi-likelihood ratio 
based on the proposed estimating functions and show that the test statistics 
asymptotically follow a ^-distribution whose degree of freedom does not 
depend on nuisance parameters, under the null hypothesis. Then a Wilks 
type theorem for testing the index is demonstrated. 

The proposed EFM technique is essentially a unified method of handling 
different types of data situations including categorical response variable and 
discrete explanatory covariate vector. The main results of this research are 
as follows: 

(a) Efficiency. A surprising result we obtain is that our EFM estimator for 
(3 has smaller or equal limiting variance than the estimator of Carroll 
et al. (1997). 

(b) Computation. The estimating function system only involves one-dimen- 
sional nonparametric smoothers, thereby avoiding the data sparsity prob- 
lem caused by high model dimensionality. Unlike the quasi-likelihood 
inference [Carroll et al. (1997)] where the maximization is difficult to 
implement when d is large, the reparameterization and the explicit for- 
mulation of the estimating functions facilitate an efficient computation 
algorithm. Here we use a fixed-point iterative scheme to compute the re- 
sultant estimator. The simulation results show that the algorithm adapts 
to higher model dimension and richer data situations than the MAVE 
method of Xia et al. (2002). 

It is noteworthy that the EFM approach proposed in this paper cannot 
be obtained from the SLS method proposed in Ichimura (1993) and inves- 
tigated in Hardle, Hall and Ichimura (1993). SLS minimizes the weighted 
least squares criterion Y^j=i\Xj ~ Mff(/^ T Xj)}] 2 V' _1 {<7(/3 T Xj)}, which leads 
to a biased estimating equation when we use its derivative if V(-) does not 
contain the parameter of interest. It will not in general provide a consistent 
estimator [see Heyde (1997), page 4]. Chang, Xue and Zhu (2010) and Wang 
et al. (2010) discussed the efficient estimation of single-index model for the 
case of additive noise. However, their methods are based on the estimating 
equations induced from the least squares rather than the quasi-likelihood. 
Thus, their estimation does not have optimal property. Also their compar- 
ison is with the one from Hardle, Hall and Ichimura (1993) and its later 
development. It cannot be applied to the setting under study. In this pa- 
per, we investigate the efficiency and computation of the estimates for the 
single-index models, and systematically develop and prove the asymptotic 
properties of EFM. 
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The paper is organized as follows. In Section 2, we state the single-index 
model, discuss estimation of g using kernel estimating functions and of (3 
using profile estimating functions, and investigate the problem of testing the 
index using quasi-likelihood ratio. In Section 3 we provide a computation 
algorithm for solving the estimating functions and illustrate the method with 
simulation and practical studies. The proofs are deferred to the Appendix. 

2. Estimating function method (EFM) and its large sample properties. 

In this section, which is concerned with inference based on the estimat- 
ing function method, the model of interest is determined through specifi- 
cation of mean and variance functions, up to an unknown vector (3 and 
an unknown function g. Except for Gaussian data, model (1.1) need not 
be a full semiparametric likelihood specification. Note that the parame- 
ter space 9 = {(3 = (ft, . . . , ft) T : \\(3\\ = l,ft > 0,(3 € R d } means that (3 
is on the boundary of a unit ball and it represents therefore a nonregular 
problem. So we first choose an identifiable parameterization which trans- 
forms the boundary of a unit ball in M. d to the interior of a unit ball in 
By eliminating ft, the parameter space G can be rearranged to a 
form {((1 - Er=2 $) 1/2 , ft, • • ■ , ft) T : Er=2 Pr < !}■ Then the derivatives 
of a function with respect to (3^ = (ft,...,ft) T are readily obtained by 
chain rule and the classic assumptions on the asymptotic normality hold 
after transformation. This reparameterization is the key to analyzing the 
asymptotic properties of the estimates for (3 and to facilitating an efficient 
computation algorithm. We will investigate the estimation for g and (3 and 
propose a quasi-likelihood method to test the statistical significance of cer- 
tain variables in the parametric component. 

2.1. The kernel estimating functions for the non-parametric part g. If /3 
is known, then we estimate g(-) and </(•) using the local linear estimating 
functions. Let h denote the bandwidth parameter, and let K(-) denote the 
symmetric kernel density function satisfying Kh(-) = h~ l K{- /h). The esti- 
mation method involves local linear approximation. Denote by «o an d a\ 
the values of g and g' evaluating at t, respectively. The local linear approx- 
imation for g((3 T x) in a neighborhood of t is g(/3 T x) = olq + ai(/3 T x — t). 
The estimators g(t) and g'(t) are obtained by solving the kernel estimating 
functions with respect to ao,ai: 

' n 

Y,Kh((3 T ^ 3 - t)/^ / {5(/9 T X J )}y- 1 {5(/3 T X J )} 
(21) I n x [r r M# X i)}] = 0. 

^OeTx, - i)^(/3 T x, - t)^'{5(/3 T x J )}y- 1 {5(/3 T x i )} 

x [Y j - f ,{~g(l3 T X j )}]=0. 
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Having estimated q.q,q.\ at t as ao,&\, the local linear estimators of g(t) 
and g'(t) are g(t) = «o and g'(t) = &i, respectively. 

The key to obtain the asymptotic normality of the estimates for j3 lies 
in the asymptotic properties of the estimated nonparametric part. The fol- 
lowing theorem will provide some useful results. The following notation will 

the 



be used. Let X = {Xi, . . . , X n }, Pl (z) = {fi f (z)} 1 ^ 1 (z) and J 
Jacobian matrix of size dx (d — 1) with 



d/3 



dp 



in 



-(3(^/^1 -\\f3^f 



/3( 1 ) = (/3 2 ,...,/3 d ) 1 . 

The moments of K and K 2 are denoted, respectively, by, j = 0,1,..., 
lj = Jt j K(t)dt and Vj = J t j K 2 (t)dt. 

Proposition 1. Under regularity conditions (a), (b), (d) and (e) given 
in the Appendix, we have: 

(i) With h — > 0, n — > oo such that h — > and nh — > oo, V/3 6 0, the 
asymptotic conditional bias and variance of g are given by 

E{{5(/3 T x)-<?(/3 T x)} 2 |^} 
= {i 72 / i V(/3 T x)} 2 

(2.2) 

+ u a 2 /[nhf /3 Tjl3 T X )p 2 {g((3 T ^}} 
+ o P (h 4 + n~ 1 h- 1 ). 

(ii) With h—>0, n — > oo suc/t i/iai h—}Q and nh? — > oo, for the estimates 
of the derivative g' , it holds that 

E{{g'(f3 T X )-g'(f3 T X )} 2 \X} 
= {h47 2 ~W(/3 T x) 

+ i( 7 47 2 " 1 -72)/ i V(/3 T x) 

x [p 2 {s(/3 T x)}/p 2 { 5 (/3 T x)} + /^t x G9 t x)// /3 t x 09 t X )]} 2 
+ l , 272 -V 2 /[n/ i 3 V x ( / 3 T x)p 2 {< 7 (/3 T x)}] 

+ o P (/i 4 + ?i- 1 /i" 3 ). 

(iii) With h—tO, n — > oo suc/i i/ioi /i — > and n/i 3 — >• oo, we /io«e i/icrf 



Id-i 

-2 



(2.3) 



(2.4) £ 



^(/3 T x) 



g>((3 T x)J ] { X -E(x\(3 ] X )} 



X\=Op{h A + n- 1 h- 
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The proof of this proposition appears in the Appendix. Results (i) and 
(ii) in Proposition 1 are routine and similar to Carroll, Ruppert and Welsh 
(1998). In the situation where a 2 V = a 2 and the function \i is identity, 
results (i) and (ii) coincide with those given by Fan and Gijbels (1996). 
From result (iii), it is seen that dg(f3 T x)/d(3 w converges in probability 
to c/(/3 T x) J T {x — E(x\(3 T x)}, rather than </(/3 T x)J T x as if g were known. 
That is, lim n ^oo{3(?(/3 T x)/<9/3 (1 )} / 5{lim n ^ 00 5(/3 T x)}/5/3 (1) , which means 
that the convergence in probability and the derivation of the sequence <? n (/3 T x) 
(as a function of n) cannot commute. This is primarily caused by the fact 
that the support of the infinite-dimensional nuisance parameter g{-) depends 
on the finite-dimensional projection parameter j3. In contrast, a semipara- 
metric model where the support of the nuisance parameter is independent 
of the finite-dimensional parameter is a partially linear regression model 
having form Y = X T + rj(T) + e. It is easy to check that the limit of 
dfj(T)/d6 is equal to E(X.\T), which is the derivative of lim n _ 5 . 00 fj(T) = 
E(Y\T) - E(X_ T \T)0 with respect to 0. Result (iii) ensures that the pro- 
posed estimator does not require undersmoothing of g(-) to obtain a root-n 
consistent estimator for (3 and it is also of its own interest in inference theory 
for semiparametric models. 

2.2. The asymptotic distribution for the estimates of the parametric part f5. 
We will now proceed to the estimation of /3 G 6. We need to estimate the 
{d — l)-dimensional vector the estimator of which will be defined via 

n 

(2.5) ^[^{9(/3 T X t )}/9/3( 1 )]F- 1 {5(/3 T X l )}[y 4 - /U {5(/3 T X l )}]=0. 

i=l 

This is the direct analogue of the "ideal" estimating equation for known g, in 
that it is calculated by replacing g{t) with g(t). An asymptotically equivalent 
and easily computed version of this equation is 

n 

G(/3)*. f ^J T 9 , (/3 T X,){X,-h(/3 T X,)} ft {9( / 3 T X,)}[r,-H9(/3 T X i )}] 

(2-6) 

= 

with J = ^dfi) the Jacobian mentioned above, g and g' are defined by (2.1), 
and h(i) the local linear estimate for h(i) = i?(X|/3 T X = t) = (hi(t), . . . , 
hd(t)Y, 

n I n 

h(t) = 5>(*) Xi / J>w< 

1=1 ' 1=1 

where h(t) = K h ((3 T Xi - t){S n , 2 (t) - (/3 T X; - t)S nA (t)}, S n , k = 
Y^=i K h{0 T xXj- t)((3 T Xi - t) k ,k = 1,2. We use (2.6) to estimate 
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in the single-index model, and then use the fact that /3\ = v/ 1 — \\(3^ || 2 to 

obtain f}\. The use of (2.6) constitutes in our view a new approach to esti- 
mating single-index models; since (2.6) involves smooth pilot estimation of 
g, g 1 and h we call it the Estimation Function Method (EFM) for (5. 

Remark 1. The estimating equations G((3) can be represented as the 
gradient vector of the following objective function: 

n 

Q(J3) = Y,Q\p{g(J3 T X<)},Y i ] 

i=l 

with Q[fi,y] = vi^-ifg)} ds and ^ _1 (-) the inverse function of fi(-). The 
existence of such a potential function makes G((3) to inherit properties of the 
ideal likelihood score function. Note that {||/3 (1) || < 1} is an open, connected 
subset of By the regularity conditions assumed on //(•), <?(•), V(-) (for 

details see the Appendix), we know that the quasi-likelihood function Q(/3) 
is twice continuously differentiable on {||/3^|| < 1} such that the global 
maximum of Q((3) can be achieved at some point. One may ask whether the 
solution is unique and also consistent. Some elementary calculations lead 
to the Hessian matrix d 2 Q((3)/d(3 ( - 1 ^d(3^ T , because the partial derivative 

dp™ 

1 d 2 Q((3) 



9 ^ T ^ )} = / 1 '{^ T X < )}s'(^ T X i ){X i - h(/3 T Xi)}, then 



nd(3^df3^ T 



1 dG(p) 



if; JV(/3 T X 4 ){X ?; - h^x < )}p 1 ^OT X<)} g^^j)> 



1 n 

i=l 



d{pW/Jl- ||/3 (1) || 2 } T , T 



<9/3« 

+ J T {X, - h09 T X < )}5^^p 1 {509 T X 4 )} 



i tT-'^Ty \rv £/flTv M ^Pl{g(/3 T Xj)} 

+ J s(/3 Xi){Xi-h(/3 Xi)} wiff 
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- -J23 T g ,2 ^ T X i ){X i - h(/3 T Xi)}{Xi - h((3 T X l )} T p 2 {g(p T X l )}J. 

By the regularity conditions in the Appendix, the multipliers of the residuals 
[Yi — /i{g(/3 T Xj)}] in the first sum of (2.7) are bounded. Mimicking the proof 
of Proposition 1, the first sum can be shown to converge to in probability 
as n goes to infinity. The second sum converges to a negative semidefinite 

matrix. If the Hessian matrix i g^i) d ^\)T is negative definite for all values of 

G(J3) has a unique root. At sample level, however, estimating functions 
may have more than one root. For the EFM method, the quasi-likelihood 
Q(/3) exists, which can be used to distinguish local maxima from minima. 
Thus, we suppose (2.6) has a unique solution in the following context. 

Remark 2. It can be seen from the proof in the Appendix that the 
population version of G(/3) is 

n 

(2.7) G(/3) = ^jV(/3 T X l ){X i -h(/3 T X i )Vi{5(/3 T X J )}[^-M< 7 (/3 T X l )}], 

i=l 

which is obtained by replacing <?,<?', h with g,g',h in (2.6). One important 
property of (2.7) is that the second Bartlett identity holds, for any (5: 

This property makes the semiparametric efficiency of the EFM (2.6) possible. 

Let j3° = (/3i,/3^° ) T denote the true parameter and B + denote the 
Moore-Penrose inverse of any given matrix B . We have the following asymp- 
totic result for the estimator j3^\ 

Theorem 2.1. Assume the estimating function (2.6) has a unique solu- 
tion and denote it by fl^ 1 ' . If the regularity conditions (a)-(e) in the Appendix 
are satisfied, the following results hold: 

(i) With h — > 0, n — > oo such that (n/i) _1 log(l//i) — > 0, converges in 
probability to the true parameter /3^ . 

(ii) // n/i 6 — > and nh 4 — > oo, 

(2.8) ~ /3 (1) °) ^ A^i(0, S^jo), 
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where £g(i)o = {J ftJ} + | /3 (i) =/ g(i)o J J = t^tj and 

n = £[{XX T - E(X\f3 T X)E(X T \f3 T X)}p 2 {g(f3 T X)}{g'(f3 T X)} 2 /a 2 ]. 

Remark 3. Note that /3 T fi/3 = 0, so the nonnegative matrix f2 de- 
generates in the direction of (3. If the mean function fj, is the identity 
function and the variance function is equal to a scale constant, that is, 
^{c/(/3 T X)} = #(/3 T X), a 2 V{g((3 T X)} = a 2 , the matrix ft in Theorem 2.1 
reduces to be 

fl = £[{XX T - E(X\(3 T X)E(X T \(3 T X)}{g'(f3 T X)} 2 /a 2 ]. 

Technically speaking, Theorem 2.1 shows that an undersmoothing ap- 
proach is unnecessary and that root-n consistency can be achieved. The 
asymptotic covariance Sgpjo in general can be estimated by replacing terms 
in its expression by estimates of those terms. The asymptotic normality of 

j3 = (j3 li j3M ) T will follow fr om Theorem 2.1 with a simple application of 

the multivariate delta-method, since /§! = yjl - H/^ 1 )!! 2 . According to the 

results of Carroll et al. (1997), the asymptotic variance of their estimator is 
f2 + . Define the block partition of matrix ft as follows: 

where fin is a positive constant, ft±2 is a (d — l)-dimensional row vector, 
r^2i is a (d — 1) -dimensional column vector and Q22 is a (d — 1) x (d — 1) 
nonnegative definite matrix. 

Corollary 1. Under the conditions of Theorem 2.1, we have 
(2.10) vHS-/3°) AiV p (0,£go) 

with S^o = J{J T f2J} + J T | ((3=( go. Further, 

and a strict less-than sign holds when det(fi22) = 0. That is, in this case 
EFM is more efficient than that of Carroll et al. (1997). 

The possible smaller limiting variance derived from the EFM approach 
partly benefits from the reparameterization so that the quasi-likelihood can 
be adopted. As we know, the quasi-likelihood is often of optimal property. 
In contrast, most existing methods treat the estimation of f3 as if it were 
done in the framework of linear dimension reduction. The target of linear 
dimension reduction is to find the directions that can linearly transform the 
original variables vector into a vector of one less dimension. For example, 
ADE and SIR are two relevant methods. However, when the link function 
/i(-) is identity, the limiting variance derived here may not be smaller or 
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equal to the ones of Wang et al. (2010) and Chang, Xue and Zhu (2010) 
when the quasi-likelihood of (2.5) is applied. 

2.3. Profile quasi-likelihood ratio test. In applications, it is important to 
test the statistical significance of added predictors in a regression model. 
Here we establish a quasi-likelihood ratio statistic to test the significance 
of certain variables in the linear index. The null hypothesis that the model 
is correct is tested against a full model alternative. Fan and Jiang (2007) 
gave a recent review about generalized likelihood ratio tests. Bootstrap tests 
for nonparametric regression, generalized partially linear models and single- 
index models have been systematically investigated [see Hardle and Mam- 
men (1993), Hardle, Mammen and Miiller (1998), Hardle, Mammen and 
Proenca (2001)]. Consider the testing problem: 



(2.11) 



<— > H 1 :g(-)=g(j2^X k + • 

\fc=l k=r+l J 

We mainly focus on testing = 0, k = r + 1, . . . , d, though the following test 
procedure can be easily extended to a general linear testing B/3 = where 
B is a known matrix with full row rank and = (/3 r +l> • • j Pd) T ■ The profile 
quasi-likelihood ratio test is defined by 

(2.12) T n = 2{supQ(/3)- sup Q((3)\, 

L /3ee /3 S e,/3=o J 

where Q(f3) = YJU Q[Mg(/3 T X. t )}, Q\p, y] = /» v{ P [s)) ds and 
is the inverse function of //(•)■ The following Wilks type theorem shows 
that the distribution of T n is asymptotically chi-squared and independent 
of nuisance parameters. 

Theorem 2.2. Under the assumptions of Theorem 2.1, if f3f- = 0,k = 
r + l,...,d, then 

(2.13) T n ^ X 2 (d-r). 
3. Numerical studies. 

3.1. Computation of the estimates. Solving the joint estimating equa- 
tions (2.1) and (2.6) poses some interesting challenges, since the functions 
5(/3 T X) and <7'(/3 T X) depend on (3 implicitly. Treating (3 T X as a new pre- 
dictor (with given j3), (2.1) gives us g,g' as in Fan, Heckman and Wand 
(1995). We therefore focus on (2.6), as estimating equations. It cannot be 
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solved explicitly, and hence one needs to find solutions using numerical meth- 
ods. The Newton-Raphson algorithm is one of the popular and successful 
methods for finding roots. However, the computational speed of this algo- 
rithm crucially depends on the initial value. We propose therefore a fixed- 
point iterative algorithm that is not very sensitive to starting values and 
is adaptive to larger dimension. It is worth noting that this algorithm can 
be implemented in the case that d is slightly larger than n, because the re- 
sultant procedure only involves one-dimensional nonparametric smoothers, 
thereby avoiding the data sparsity problem caused by high dimensionality. 
Rewrite the estimating functions as G(/3) = J T F(/3) with 

F(/3) = (F 1 (/3),...,W)) T 

and 

n 

F s ((3) = Y,{Xsi - h s (p T X i )y{g(p T X i )}g'(p T X i )V- 1 {g(p T X i )} 

i=l 

x [Yi-^g^Xi)}}. 
Setting G(/3) = 0, we have that 



-f3 2 F 1 (P)/y/ l-\\pW\\ 2 + F 2 ((3) = 0, 
-/3 3 A(/Wl-||/3 (1) || 2 + F 3 ((3) = 0, 



(3.1) 



Note that ||/3 (1) || 2 = Er=2/ 3 r> A = v/l- ll/3 (1) P and after some simple cal- 
culations, we can get that 

UHA(/3)|/||F(/3)||, 8 = 1, 
\tf = F?(P)/\\F(P)\\ 2 , s>2, 

and sign{/3 s Fi(/3)} = sign{F s (P)},s > 2. The above equation can also be 
rewritten as 

(3.2) p Pl{P) - ]Pim - 



F(P)\\ \\F(P)\\ \\F(P)\\ 

Then solving the equation (2.6) is equivalent to finding a fixed point for (3.2). 
Though < 1 holds almost surely in (3.2) and always \\P\\ = 1, there 

will be some trouble if (3.2) is directly used as iterative equations. Note that 
the value of ||F(/3)|| is used as denominator that may sometimes be small, 
which potentially makes the algorithm unstable. On the other hand, the 
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convergence rate of the fixed-point iterative algorithm derived from (3.2) 
depends on L, where || dW/ 3 )!/^^)!!} || < £ p or a f as ^ convergence rate, it 
technically needs a shrinkage value L. An ad hoc fix introduces a constant 
M, adding M0 on both sides of (3.2) and dividing by F 1 (f3)/\\F(f3)\\ +M: 

= — 3 + Jhmm^F(8) 

F 1 (0)/\\F(0)\\+M H F 1 (0)/\\F(0)\\+M ^ h 

where M is chosen such that i*i(/3)/||F(/3)|| + M / 0. In addition, to accel- 
erate the rate of convergence, we reduce the derivative of the term on the 
right-hand side of the above equality, which can be achieved by choosing 
some appropriate M . This is the iteration formulation in Step 2. Here the 
norm of new is not equal to 1 and we have to normalize it again. Since the 
iteration in Step 2 makes new to violate the identifiability constraint with 
norm 1, we design (3.2) to include the whole vector. The possibility of 

renormalization for new avoids the difficulty of controlling ||/3ne«>|| < 1 in 
each iteration in Step 2. 

Based on these observations, the fixed-point iterative algorithm is sum- 
marized as: 

Step 0. Choose initial values for 0, denoted by o \ d . 

Step 1. Solve the estimating equation (2.1) with respect to a, which yields 

g{Plid*i) and 9'(PJid*i), l<i<n. 

Step 2. Update old with oU = new / \\0 new \\ by solving the equation 
(2.6) in the fixed-point iteration 

& = M \Fi(0 old )\/\\F(0 old )\\ 2 f 

F l (0 old )/\\F{0 old )\\+M Pold F 1 {0 old )/\\F{0 old )\\+M {PMh 

where M is a constant satisfying Fi(0) /\\F(0)\\ + M / for any 0. 

Step 3. Repeat Steps 1 and 2 until maxi< s <d \ j3 neWtS — P ld,s\ < t°l 1S m et 
with tol being a prescribed tolerance. 

The final vector ne w/\\Pnew\\ ls t ne estimator of 0°. Similarly to other di- 
rect estimation methods [Horowitz and Hardle (1996)], the preceding calcu- 
lation is easy to implement. Empirically the initial value for 0, (1,1,..., 1) T / 
\fd can be used in the calculations. The Epanechnikov kernel function K(t) = 
3/4(1 — t 2 )/(|i| < 1) is used. The bandwidth involved in Step 1 can be cho- 
sen to be optimal for estimation of g(t) and g'{t) based on the observa- 
tions {0j ld Xi,Yi}. So the standard bandwidth selection methods, such as 
K-fold cross-validation, generalized cross-validation (GCV) and the rule of 
thumb, can be adopted. In this step, we recommend -fT-fold cross-validation 
to determine the optimal bandwidth using the quasi-likelihood as a criterion 
function. The i'T-fold cross-validation is not too computationally intensive 
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while making K not take too large values (e.g., K = 5). Here we recom- 
mend trying a number of smoothing parameters that smooth the data and 
picking the one that seems most reasonable. As an adjustment factor, M 
will increase the stability of iteration. Ideally, in each iteration an optimum 
value for M should be chosen guaranteeing that the derivative on the right- 
hand side of the iteration formulation in Step 2 is close to zero. Following 
this idea, M will be depending the changes of j3 and F(/3)/||F(/3)||. This 
will be an expensive task due to the computation for the derivative on the 
right-hand side of the iteration formulation in Step 2. We therefore consider 
M as constant nonvarying in each iteration, and select M by the i^-fold 
cross-validation method, according to minimizing the model prediction er- 
ror. When the dimension d gets larger, M will get smaller. In our simulation 
runs, we empirically search M in the interval \2/y/d, d/2]. This choice gives 
pretty good practical performance. 

3.2. Simulation results. 

Example 1 (Continuous response). We report a simulation study to in- 
vestigate the finite-sample performance of the proposed estimator and com- 
pare it with the rMAVE [refined MAVE; for details see Xia et al. (2002)] 
estimator and the EDR estimator [see Hristache et al. (2001), Polzehl and 
Sperlich (2009)]. We consider the following model similar to that used in 
Xia (2006): 

E(Y\(3 T X)=g((3 T X), g((3 T X) = (/3 T X) 2 exp(/3 T X); 

(3.3) 

Var(Y|/3 T X) = a 2 , a = 0.1. 

Let the true parameter (3 = (2, 1, 0, . . . ,0) T /\/5- Two sets of designs for X 
are considered: Design (A) and Design (B). In Design (A), (X s + l)/2~ 
Beta(r, 1), 1 < s < d and, in Design (B), (X 1 + 1)/2 ~ Beta(r, 1) and P(X S = 
±0.5) = 0.5, s = 2,3,4, ... ,d. The data generated in Design (A) are not 
elliptically symmetric. All the components of Design (B) are discrete except 
for the first component X\ . Y is generated from a normal distribution. This 
simulation data set consists of 400 observations with 250 replications. The 
results are shown in Table 1. All rMAVE, EDR and EFM estimates are close 
to the true parameter vector for d = 10. However, the average estimation 
errors from rMAVE and EDR estimates for d = 50 are about 2 and 1.5 
times as large as those of the EFM estimates, respectively. This indicates 
that the fixed-point algorithm is more adaptive to high dimension. 

Example 2 (Binary response). This simulation design assumes an un- 
derlying single-index model for binary responses with 

P(Y = 1|X) = M<?(/3 T X)} = exp{ 9 (/3 T X)}/[l + exp{ 5 (/3 T X)}], 

(3.4) 

c/(/3 T X) = exp(5/3 T X - 2)/{l + exp(5/3 T X - 3)} - 1.5. 



THE EFM APPROACH FOR SINGLE-INDEX MODELS 



15 



Table 1 

Average estimation errors =1 \P s ~ PA f or m °del (3.3) 



d 


T 




Design (A) 






Design (B) 




rMAVE 


EDR 


EFM 


rMAVE 


EDR 


EFM 


10 


0.75 


0.0559* 


0.0520 


0.0792 


0.0522* 


0.0662 


0.0690 


10 


1.5 


0.0323* 


0.0316 


0.0298 


0.0417* 


0.0593 


0.0457 


50 


0.75 


0.9900 


0.7271 


0.5425 


0.9780 


0.7712 


0.4515 


50 


1.5 


0.3776 


0.3062 


0.1796 


0.4693 


0.4103 


0.2211 



*The values are adopted from Xia (2006). 



The underlying coefficients are assumed to be /3 = (2, 1,0,..., 0) T /\/5- We 
consider two sets of designs: Design (C) and Design (D). In Design (C), 
X\ and X2 follow the uniform distribution U(— 2,2). In Design (D), X\ 
is also assumed to be uniformly distributed in interval (—2,2) and (X2 + 
l)/2 ~ Beta(l, 1). Similar designs for generalized partially linear single-index 
models are assumed in Kane, Holt and Allen (2004). Here a sample size 
of 700 is used for the case d = 10 and 3,000 is used for d = 50. Different 
sample sizes from Example 1 are used due to varying complexity of the two 
examples. For this example, 250 replications are simulated and the results 
are displayed in Table 2. In this set of simulations, the average estimation 
errors from rMAVE estimates and EDR estimates are about 1.5 and 1.2 
times as large as EFM estimates, under both Design (C) and Design (D) 
for d = 10 or d = 50. The values in the row marked by d = 50 look a little 
bigger. However, it is reasonable because the number of summands in the 
average estimate error for d = 50 is five times as large as that for d = 10. 
Again it appears that the EFM procedure achieves more precise estimators. 

Example 3 (A simple model). To illustrate the adaptivity of our al- 
gorithm to high dimension, we consider the following simple single-index 
model: 

(3.5) y = (/3 T X) 2 +e. 



Table 2 

Average estimation errors J^ s=1 |/3 S — /3 S \ for model (3.4) 





Design (C) 






Design (D) 




d 


rMAVE EDR 


EFM 


rMAVE 


EDR 


EFM 


10 
50 


0.5017 0.5281 
2.0991 1.2695 


0.4564 
1.1744 


0.9614 
2.5040 


0.9574 
2.4846 


0.7415 
1.9908 
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Table 3 










Average estimation errors X/ — i 


-PA for 


model (3.5) 




e 




d = 10 


d = 50 


d = 100 


d = 120 




rMAVE 


0.0318 


0.3484 






e f 


-iV(0,0.2 2 ) EDR 


0.0363 


0.5020 








EFM 


0.0272 


0.2302 


2.9409 


5.0010 




rMAVE 


0.3427 


4.6190 






e f 


^JV(0,exp( 2Xl + X2 )) EDR 


0.2542 


2.1112 








EFM 


0.2201 


1.7937 


4.1435 


6.4973 



- means that the values cannot be calculated by rMAVE and EDR because of high 
dimension. 



The true parameter is (5= (2, 1,0,... , 0) T /\/5; X is generated from A^(2, 1). 
Both homogeneous errors and heterogeneous ones are considered. In the 
former case, e ~ iV(0, 0.2 2 ) and in the latter case, e = exp(v / 5/3 T X/14)e 
with e ~ N(0, 1). The latter case is designed to show whether our method 
can handle heteroscedasticity. A similar modeling setup was also used in 
Wang and Xia (2008), Example 5. The simulated results given in Table 3 are 
based on 250 replicates with a sample of n = 100 observations. An important 
observation from this simulation is that the proposed EFM approach still 
works even when the dimension of the parameter is equal to or slightly 
larger than the number of observations. It can be seen from Table 3 that 
our approach also performs well under the heteroscedasticity setup. 

Example 4 (An oscillating function model). A single-index model is 
designed as 

(3.6) Y = sin(a/3 T X) + e, 

where j3 = (2, 1, 0, . . . , 0) T /\/5, X is generated from N d (2, 1) and e ~ JV(0, 0.2 2 ) 
The number of replications is 250 and the sample size n = 400. The simula- 
tion results are shown in Table 4. In these chosen values for a, we see that 
EFM performs better than rMAVE and EDR. But as is understood, more 
oscillating functions are more difficult to handle than those less oscillating 
functions. 



Table 4 

Average estimation errors J^ s=1 \f) a — Pa\ for model (3.6) 







a = 7r/2 




a 


= 3tt/4 




d 


rMAVE 


EDR 


EFM 


rMAVE 


EDR 


EFM 


10 

50 


0.0981 
0.5247 


0.0918 
0.6934 


0.0737 
0.4355 


0.0970 
0.6350 


0.0745 
1.8484 


0.0725 
0.5407 
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Table 5 

Estimation for (3 of model ( 3. 7) based on two randomly chosen samples 





One group of sampl 


e 


Another group of sample 


Xx 


x 2 


x 3 


Xi 


x 2 


x 3 


GPLSIM est. 


0.595* 


0.568* 


0.569* 


0.563* 


0.574* 


0.595* 


GPLSIM s.e. 


0.013* 


0.013* 


0.013* 


0.010* 


0.010* 


0.010* 


EFM est. 


0.579 


0.575 


0.577 


0.573 


0.577 


0.580 


EFM s.e. 


0.011 


0.011 


0.011 


0.010 


0.010 


0.010 



*The values are adopted from Carroll et al. (1997). We abbreviate "estimator" to "est." 
and "standard error" to "s.e.," which are computed from the sample version of defined 
in (2.10). 



Example 5 (Comparison of variance). To make our simulation results 
comparable with those of Carroll et al. (1997), we mimic their simulation 
setup. Data of size 200 are generated according to the following model: 

(3.7) Yi = sin{7r(/3 T Xi - A)/{B - A)} + aZ { + e h 

where Xj are trivariate with independent U(0, 1) components, Zi are inde- 
pendent of Xj and Zi = are for i odd and Zi = l for i even, and Ei follow a 
normal distribution A r (0,0.01) independent of both Xj and Zi. The param- 
eters are taken to be (3 = (1, 1, l) T /\/3, a = 0.3, A = V3/2 - 1.645/v^2 and 
B = v/3/2 4- 1.645/v / 12- Note that the EFM approach can still be applicable 
for this model as the conditionally centered response Y given Z has the 
model as, because of the independence between X and Z, 

Y - E(Y\Zi) =a + sin{7r(/3 T Xi - A)/{B - A)} + e 4 . 

As Zi are dummy variables, estimating E{Yi\Zi) is simple. Thus, when we 
regard Y^ — E(Yi\Zi) as response, the model is still a single-index model. Here 
the number of replications is 100. The method derived from Carroll et al. 
(1997) is referred to be the GLPSIM approach. The numerical results are 
reported in Table 5. It shows that compared with the GPLSIM estimates, 
the EFM estimates have smaller bias and smaller (or equal) variance. Also 
in this example both EFM and GPLSIM can provide reasonably accurate 
estimates. 

Performance of profile quasi-likelihood ratio test. To illustrate how the 
profile quasi-likelihood ratio performs for linear hypothesis problems, we 
simulate the same data as above, except that we allow some components of 
the index to follow the null hypothesis: 

H :(3 4 = f3 5 = --- = p d = 0. 
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We examine the power of the test under a sequence of the alternative hy- 
potheses indexed by parameter 5 as follows: 

Hi : /3 4 = 6, p a = for s > 5. 

When 5 = 0, the alternative hypothesis becomes the null hypothesis. 

We examine the profile quasi-likelihood ratio test under a sequence of 
alternative models, progressively deviating from the null hypothesis, namely, 
as 5 increases. The power functions are calculated at the significance level: 
0.05, using the asymptotic distribution. We calculate test statistics from 250 
simulations by employing the fixed-point algorithm and find the percentage 
of test statistics greater than or equal to the associated quantile of the 
asymptotic distribution. The pictures in Figures 1, 2 and 3 illustrate the 




Fig. 2. Simulation results for Design (B) in Example 1. The left graphs depict the case 
t — 1.5 with t the first parameter in Beta(r, 1). The right graphs are for r = 0.75. 
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Design (C), d=10 Design (C), d=50 Design (D), d=10 Design (D), d=50 




0.2 0.4 0.2 0.4 0.5 1 0.2 0.4 0.6 



5 5 5 5 

Fig. 3. Simulation results for Example 2. The left graphs depict the case of Design (C) 
with parameter dimension being 10 and 50. The right graphs are for Design (D). 

power function curves for two models under the given significance levels. 
The power curves increase rapidly with 5, which shows the profile quasi- 
likelihood ratio test is powerful. When 5 is close to 0, the test sizes are all 
approximately the significance levels. 

3.3. A real data example. Income, to some extent, is considered as an 
index of a successful life. It is generally believed that demographic informa- 
tion, such as education level, relationship in the household, marital status, 
the fertility rate and gender, among others, has effects on amounts of income. 
For example, Murray (1997) illustrated that adults with higher intelligence 
have higher income. Kohavi (1996) predicted income using a Bayesian clas- 
sifier offered by a machine learning algorithm. Madalozzo (2008) examined 
income differentials between married women and those who remain single or 
cohabit by using multivariate linear regression. Here we will use the single- 
index model to explore the relationship between income and some of its 
possible determinants. 

We use the "Adult" database, which was extracted from the Census Bu- 
reau database and is available on website: http : //archive . ics . uci . edu/ 
ml/datasets/Adult. It was originally used to model income exceeds over 
USD 50,000/year based on census data. The purpose of using this example 
is to understand the personal income patterns and demonstrate the per- 
formance of the EFM method in real data analysis. After excluding a few 
missing data, the data set in our study includes 30,162 subjects. The selected 
explanatory variables are: 

• sex (categorical): 1 = Male, = Female. 

• native- country (categorical): 1 = United-States, = others. 
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• work-class (categorical): 1 = Federal-gov, 2 = Local-gov, 3 = Private, 4 = 
Self-emp-inc (self-employed, incorporated), 5 = Self-emp-not-inc (self- 
employed, not incorporated), 6 = State-gov. 

• marital- status (categorical): 1 = Divorced, 2 = Married- AF-spouse (mar- 
ried, armed forces spouse present), 3 = Married-civ-spouse (married, civil- 
ian spouse present), 4 = Married-spouse-absent [married, spouse absent 
(exc. separated)], 5 = Never-married, 6 = Separated, 7 = Widowed. 

• occupation (categorical): 1 = Adm-clerical (administrative support and 
clerical), 2 = Armed- Forces, 3 = Craft-repair, 4 = Exec-managerial (execu- 
tive-managerial), 5 = Farming- fishing, 6 = Handlers-cleaners, 7 = Machine- 
op-inspct (machine operator inspection), 8 = Other-service, 9 = Priv-house- 
serv (private household services), 10 = Prof-specialty (professional spe- 
cialty), 11 = Protective-serv, 12 = Sales, 13 = Tech-support, 14 = Trans- 
port-moving. 

• relationship (categorical): 1 = Husband, 2 = Not-in- family, 3 = Other-rela- 
tive, 4 = Own-child, 5 = Unmarried, 6 = Wife. 

• race (categorical): 1 = Amer-Indian-Eskimo, 2 = Asian-Pac-Islander, 3 = 
Black, 4 = Other, 5 = White. 

• age (integer): number of years of age and greater than or equal to 17. 

• fnlwgt (continuous): The final sampling weights on the CPS files are con- 
trolled to independent estimates of the civilian noninstitutional popula- 
tion of the United States. 

• education (ordinal): 1 = Preschool (less than 1st Grade), 2 = lst-4th, 3 = 
5th-6th, 4 = 7th-8th, 5 = 9th, 6 = 10th, 7 = 11th, 8 = 12th (12th Grade no 
Diploma), 9 = HS-grad (high school Grad-Diploma or Equiv), 10 = Some- 
college (some college but no degree), 11 = Assoc- voc (associate degree- 
occupational/vocational), 12 = Assoc-acdm (associate degree- academic 
program), 13 = Bachelors, 14 = Masters, 15 = Prof-school (professional 
school), 16 = Doctorate. 

• education-num (continuous): Number of years of education. 

• capital-gain (continuous): A profit that results from investments into a 
capital asset. 

• capital-loss (continuous): A loss that results from investments into a cap- 
ital asset. 

• hours-per-week (continuous): Usual number of hours worked per week. 

Note that all the explanatory variables up to "age" are categorical with 
more than two categories. As such, we use dummy variables to link up the 
corresponding categories. Specifically, for every original explanatory variable 
up to "age," we use dummy variables to indicate it in which the number of 
dummy variables is equal to the number of categories minus one. By doing 
so, we then have 41 explanatory variables, where the first 35 ones are dummy 
and the remaining ones are continuous. After a preliminary data check, we 
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find that the explanatory variables X37 = "fnlwgt," -X39 = "capital-gain" 
and X40 = "capital-loss" are very skewed to the left and the latter two often 
take zero value. Before fitting (3.8) we first make a logarithm transforma- 
tion for these three variables to have log ("fnlwgt"), log(l + "capital-gain") 
and log(l + "capital- loss" ) . To make the explanatory variables comparable 
in scale, we standardize each of them individually to obtain mean and 
variance 1. Since "education" and "education- num" are correlated, "edu- 
cation" is dropped from the model and it results in a significantly smaller 
mean residual deviance. 

The single-index model will be used to model the relationship between 
income and the relevant 43 predictors X = (X\, . . . , X4s) T : 

(3.8) P( "income" > 50,000|X) = exp{ 5 (/3 T X)}/[l + exp{c/(/3 T X)}], 

where Y = /("income" > 50,000) and (3 = (/?i, . . . ,/343) T and /3 S represents 
the effect of the sth predictor. Formally, we are testing the effect of gender, 
that is, 

(3.9) H : Pi = < — ► Hi:Pi^ 0. 

The fixed-point iterative algorithm is employed to compute the estimate 
for (3. To illustrate further the practical implications of this approach, we 
compare our results to those obtained by using an ordinary logistic regres- 
sion (LR). The coefficients of the two models are given in Table 6. To make 
the analyses presented in the table comparable, we consider two standard- 
izations. First, we standardize every explanatory variable with mean and 
variance 1 so that the coefficients can be used to compare the relative influ- 
ence from different explanatory variables. However, such a standardization 
does not allow us to compare between the single-index model and the or- 
dinary logistic regression model. We then further normalize the coefficients 
to be with Euclidean norm 1, and then the estimates of their standard er- 
rors are also adjusted accordingly. The single-index model provides more 
reasonable results: X-jg = "education-num" has its strongest positive effect 
on income; those who got a bachelor's degree or higher seem to have much 
higher income than those with lower education level. In contrast, results de- 
rived from a logistic regression show that "married-civ-spouse" is the largest 
positive contributor. 

Some other interesting conclusions could be obtained by looking at the 
output. Both "sex" and "native-country" have a positive effect. Persons who 
worked without pay in a family business, unpaid childcare and others earn 
a lower income than persons who worked for wages or for themselves. The 
"fnlwgt" attribute has a positive relation to income. Males are likely to make 
much more money than females. The expected sign for marital status ex- 
cept the married (married- AF-spouse, married-civ-spouse) is negative, given 



X. CUI, W. K. HARDLE AND L. ZHU 

Table 6 

Fitted coefficients for model (3.8) (estimated standard errors in parentheses) 



Variables $ of SIM [3 of LR 



Sex 





1102 


(0 


0028) 





1975 


(0 


0181) 


IN ative-country 


n 
u 


ri/i 1 9 


[ u 


^\(^ r >7^ 

UUZ l J 


u 


UO04 


[ u 


Ul 10) 


Work-class 


















Federal-gov 





1237 


(0 


0059) 





0739 


(0 


0108) 


Local-gov 





2044 


(o 


0065) 





0155 


(o 


0135) 


Private 


-0 


2603 


(o 


0075) 





0775 


(o 


0200) 


Self-em-inc 





1252 


(0 


0068) 





0520 


(0 


0112) 


Self-emp-not-inc 





1449 


(0 


0066) 


-0 


0157 


(0 


0147) 


Marital-Status 


















Divorced 


-0 


0353 


(0 


0061) 


-0 


0304 


(0 


0264) 


Married- AF-spouse 





0195 


(0 


0036) 





0333 


(0 


0079) 


Married-civ-spouse 





3257 


(0 


0150) 





4545 


(0 


0754) 


Married-spouse-absent 


-0 


0115 


(0 


0029) 


-0 


0095 


(0 


0146) 


Never-married 


-0 


1876 


(0 


0085) 


-0 


1452 


(0 


0370) 


Separated 


-0 


0412 


(0 


0050) 


-0 


0221 


(0 


0179) 


Occupation 


















Adm-clerical 


-0 


0302 


(0 


0050) 





0131 


(0 


0164) 


Armed-Forces 


-0 


0086 


(0 


0031) 


-0 


0091 


(0 


0131) 


Craft-repair 


-0 


0913 


(0 


0050) 





0263 


(0 


0146) 


Exec-managerial 





1813 


(0 


0061) 





1554 


(0 


0148) 


Farming- fishing 


-0 


0370 


(0 


0036) 


-0 


0772 


(o 


0125) 


Handlers-cleaners 


-0 


0947 


(0 


0033) 


-0 


0662 


(0 


0153) 


Machine-op- inspct 


-0 


1067 


(0 


0038) 


-0 


0290 


(0 


0133) 


Other-service 


-0 


1227 


(0 


0045) 


-0 


1192 


(0 


0195) 


Priv-house-serv 


-0 


0501 


(0 


0020) 


-0 


0833 


(0 


0379) 


Prof-specialty 





2502 


(0 


0065) 





1153 


(0 


0160) 


Protective- serv 





1954 


(0 


0061) 





0508 


(0 


0095) 


Sales 





0316 


(0 


0050) 





0615 


(0 


0147) 


Tech-support 





0181 


(0 


0037) 





0619 


(0 


0102) 


Relationship 


















Husband 


-0 


1249 


(0 


0093) 


-0 


3264 


(0 


0254) 


Not-in-family 


-0 


0932 


(0 


0093) 


-0 


2074 


(0 


0612) 


Other-relative 


-0 


0958 


(0 


0038) 


-0 


1498 


(0 


0219) 


Own-child 


-0 


2218 


(0 


0076) 


-0 


3769 


(0 


0498) 


Unmarried 


-0 


1124 


(0 


0067) 


-0 


1739 


(0 


0446) 


Race 


















Amer-Indian-Eskimo 


-0 


0252 


(0 


0024) 


-0 


0226 


(0 


0109) 


Asian-Pac-Islander 





0114 


(0 


0030) 





0062 


(0 


0101) 


Black 


-0 


0300 


(0 


0024) 


-0 


0182 


(0 


0111) 


Other 


-0 


0335 


(0 


0021) 


-0 


0286 


(0 


0129) 
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Table 6 
( Continued) 



Variables 


(3 of SIM 


(3 of LR 


Age 


0.2272 (0.0042) 


0.1798 (0.0111) 


Fnlwgt 


0.0099 (0.0028) 


0.0414 (0.0092) 


Education- num 


0.4485 (0.0045) 


0.3732 (0.0122) 


Capital-gain 


0.2859 (0.0055) 


0.2582 (0.0084) 


Capital-loss 


0.1401 (0.0042) 


0.1210 (0.0078) 


Hours-per-week 


0.2097 (0.0035) 


0.1823 (0.0101) 



that the household production theory affirms that division of work is effi- 
cient when each member of a family dedicates his or her time to the more 
productive job. Men usually receive relatively better compensation for their 
time in the labor market than in home production. Thus, the expectation is 
that married women dedicate more time to home tasks and less to the labor 
market, and this would imply a different probability of working given the 
marital status choice. 

Also "race" influences the income and Asian or Pacific Islanders seem 
to make more money than other races. And also, one's income significantly 
increases as working hours increase. Both "capital-gain" and "capital-loss" 
have positive effects, so we think that people make more money who can 
use more money to invest. The presence of young children has a negative 
influence on the income, "age" accounts for the experience effect and has 
a positive effect. Hence the conclusion based on the single-index model is 
consistent with what we expect. 

To help with interpretation of the model, plots of /3 T X versus predicted 
response probability and g(/3 T X) are generated, respectively, and can be 
found on the right column in Figure 4. When the estimated single-index 
is greater than 0, <?(/3X) shows some degree of curvature. An alternative 
choice is to fit the data using generalized partially linear additive models 
(GPLAM) with nonparametric components of continuous explanatory vari- 
ables. The relationships among "age," "fnlwgt," "capital-gain," "capital- 
loss" and "hours-per-week" all show nonlinearity. The mean residual de- 
viances of SIM, LR and GPLAM are 0.7811, 0.6747 and 0.6240, respectively. 
SIM under study provides a slightly worse fit than the others. However, we 
note that LR is, up to a link function, linear about X, and, according to 
the results of GPLAM, which is a more general model than LR, the actual 
relationship cannot have such a structure. SIM can reveal nonlinear struc- 
ture. On the other hand, although the minimum mean residual deviance can 
be not surprisingly attained by GPLAM, this model has, respectively, ~ 34 
and 41 more degrees of freedom than SIM and LR have. 
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Fitting beset! on EFM method The fitted curve for the unknown link functions 




/Fx ^ T X 

FlG. 4. Adult data: The left graph is a plot of predicted response probability based on the 
single-index model. The right graph is the fitted curve for the unknown link function <?(•). 



We now employ the quasi-likelihood ratio test to the test problem (3.9). 
The QLR test statistic is 166.52 with one degree of freedom, resulting in a 
P- value of < 1CP 5 . Hence this result provides strong evidence that gender 
has a significant influence on high income. 

The Adult data set used in this paper is a rich data set. Existing work 
mainly focused on the prediction accuracy based on machine learning meth- 
ods. We make an attempt to explore the semiparametric regression pattern 
suitable for the data. Model specification and variable selection merit further 
study. 

APPENDIX: OUTLINE OF PROOFS 

We first introduce some regularity conditions. 
Regularity Conditions: 

(a) //(•), V(-), <?(•), h(-) = E(X.\f3 T 'K = •) have two bounded and continuous 
derivatives. V(-) is uniformly bounded and bounded away from 0. 

(b) Let q(z,y) = n'(z)V~ 1 (z){y — fj>(z)}. Assume that dq(z,y)/dz < for 
z£K and y in the range of the response variable. 

(c) The largest eigenvalue of f&22 is bounded away from infinity. 

(d) The density function / | gT x (/3 T x) of random variable /3 T X is bounded 
away from on T^g and satisfies the Lipschitz condition of order 1 on 
T&, where Tg = {/3 T x : x € T} and T is a compact support set of X. 

(e) Let Q*\J3] = f Qlv{g((3 T x)},y]f(y\p 0T x)f(<3 0T x.)dyd(j3 0T x) with /3° 
denoting the true parameter value and Q[/jl, y] = Jjj v{jl-i(s)} ^ ssume 
that Q* [f3] has a unique maximum at (3 = /3° , and 



E sup sup | Ai '{0O3 T X)}^- 1 {0(/3 T X)}[y - ^{ 5 (/3 T X)}]p 

l pW /3 T X 

and £||X|| 2 < oo. 



< oo 
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(f) The kernel K is a bounded and symmetric density function with a 
bounded derivative, and satisfies 

/oo poo 
t 2 K(t)dt^0 and / \t\ j K(t) dt < oo, j = 1, 2, 
-oo J — oo 

Condition (a) is some mild smoothness conditions on the involved func- 
tions of the model. We impose condition (b) to guarantee that the so- 
lutions of (2.1), g(t) and g'(t), lie in a compact set. Condition (c) im- 
plies that the second moment of estimating equation (2.7), tr(J T fiJ), is 
bounded. Then the CLT can be applied to G(/3). Condition (d) means that 
X may have discrete components and the density function of /3 T X is pos- 
itive, which ensures that the denominators involved in the nonparametric 
estimators, with high probability, are bounded away from 0. The unique- 
ness condition in condition (e) can be checked in the following case for 
example. Assume that Y is a Poisson variable with mean /i{g(/3 T x)} = 
exp{g((3 1 x)} . The maximizer /3q of Q*[/3] is equal to the solution of the 
equation E[E{ [exp {g (/3 0T X)} - exp{ 5 (/3 T X)}] 5 / ( / 3 T X)}J T X|/3 0T X}] = 0. 
/3 is unique when </(•) is not a zero- valued constant function and the ma- 
trix J T i?(XX T )J is not singular. Under the second part of condition (e), it 
is permissible to interchange differentiation and integration when differen- 
tiating E[Q[n{g(p T ~K)},Y}]. Condition (f) is a commonly used smoothness 
condition, including the Gaussian kernel and the quadratic kernel. All of the 
conditions can be relaxed at the expense of longer proofs. 

Throughout the Appendix, Z n = Op(a n ) denotes that a~ l Z n is bounded 
in probability and the derivation for the order of Z n is based on the fact 
that Z n = Op{^J E(Z%)}. Therefore, it allows to apply the Cauchy-Schwarz 
inequality to the quantity having stochastic order a n . 

A.l. Proof of Proposition 1. We outline the proof here, while the details 
are given in the supplementary materials [Cui, Hardle and Zhu (2010)]. 

(i) Conditions (a), (b), (d) and (f) are essentially equivalent conditions 
given by Carroll, Ruppert and Welsh (1998), and as a consequence the 
derivation of bias and variance for g((3 T x) and g'((3 T x) is similar to that of 
Carroll, Ruppert and Welsh (1998). 

(ii) The first equation of (2.1) is 

n 

= Y, K h(P T *j - /3 T x)^{ao + ai(/3 T X, - /3 T x)} 
i=i 

x V^i&o + ai(/3 T X,- - /9 T x)}[^- - fi{a + a^Xj - /3 T x)}]. 

Taking derivatives with respect to (3^ on both sides, direct observations 
lead to 

= {B(p T x)}- 1 {A 1 ((3 T x) + ^ 2 (/3 T x) + ^ 3 (/3 T x)}, 
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where 

n 

5(/3 T x) =-J2 KhipTXj - /3 T x)^{«o + & 1 {0 l X j - /3 T x), Yj}, 

n 

4i(/3 T x) =^^(/3 T X j - /3 T x)J T (X i - x)^{a + «i(/3 T X j - /3 T x), Y^ai, 
3=1 

n 

^ 2 (/3 T x) =^^(/3 T X j - ^ T x)gU«o + «i(/3 T X j - /3 T x), Y,} 
3=1 



x (/3 1 Xj - (3 1 x) 



d(3 



(i) 



^ 3 (/3 T x) = ^/T 1 i^(/3 T X j - /3 T x)J T (X, - x)g{a + «i(/3 T X, - /3 T x),lj} 

3=1 

with = h~ 1 K'(-/h). Note that da Q /d(3 {1) = dg(/3 T x.)/d(3 {1) ; then we 

have 

^^ = {5(/3 T x)r 1 A 1 (/3 T x) 

(A.l) 

+ {B(P T X )}- 1 A 2 ((3 T X ) + {J503 T x)}-%03 T x). 

We will prove that 

EMBfpTx)}-^^*) - </(/3 T x)J T {x - h(/3 T x)}|| 2 

(A.2) 

= P (h 4 + n- l h- 3 ), 

the second term in (A.l) is of order Op{h + n~ 1 h), and the third term 
is of order Op(h 4 + n _1 /i~ 3 ). The combination of (A.l) and these three 
results can directly lead to result (ii) of Proposition 1. The detailed proof is 
summarized in three steps and is given in the supplementary materials [Cui, 
Hardle and Zhu (2010)]. 

(iii) By mimicking the proof of (ii), we can show that (iii) holds. See 
supplementary materials for details. 

A.2. Proofs of (2.6) and (2.7). It is proved in the supplementary mate- 
rials [Cui, Hardle and Zhu (2010)]. 

A. 3. Proof of Theorem 2.1. (i) Note that the estimating equation defined 
in (2.6) is just the gradient of the following quasi- likelihood: 



Q((3) = Y,QM9(f3 T X l )},Y l ] 



i=l 
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with Q[/j,,y] = J^" yT^pTCTT ds and [i is the inverse function of //(•). Then 



for /3 (1) satisfying ( J I - ||/3 (1) || 2 , /3 (1)T ) T G 9, we have 



/3 



(i) 



arg max<5(/3). 



The proof is based on Theorem 5.1 in Ichimura (1993). In that theorem the 
consistency of is proved by means of proving that 



(A.3) 



sup 



1 n 1 n 



i=l 



(A.4) sup 



1 n 1 n 



i=i 



i=l 



op(l), 
o P (l) 



and 



(A.5) 



1 n i n 



j=l 



op(l). 



Regarding the validity of (A.5), this directly follows from (A.3) and (A.4). 
The type of uniform convergence result such as (A.4) has been well estab- 
lished in the literature; see, for example, Andrews (1987). We now verify 
the validity of (A.3), which reduces to showing the uniform convergence of 
the estimator g(t) under condition (e) [see Ichimura (1993)]. This can be 
obtained in a similar way as in Kong, Linton and Xia (2010), taking into 
account that the regularity conditions imposed in Theorem 2.1 are stronger 
than the corresponding ones in that paper. 

(ii) Recall the notation J, ft and G(/3) introduced in Section 2. By (2.7), 
we have shown that 



(A.6) 



-^{J T r2J} + G03) + o P (l). 



n 



Theorem 2.1 follows directly from the above asymptotic expansion and the 
fact that E{G(P)G T (P)} = nJ T ftJ. □ 



A.4. Proof of Corollary 1. The asymptotic covariance of can be ob- 
tained by adjusting the asymptotic covariance of via the multivari- 
ate delta method, and is of form J(J T ftJ) + J T . Next we will compare this 
asymptotic covariance with that (denoted by ft + ) given in Carroll et al. 
(1997). Write ft as 

n =(n U ft 2 )* 

V s '21 "22 / 
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where H22 is a (d — 1) x (d — 1) matrix. We will next investigate two cases, 
respectively: det(ft 22 ) / and det(ft 22 ) = 0. Let a = -f3 [l) /^jl - ||/3 (1) || 2 = 
-/3 (1) //3i- 

Consider the case that det(fi22) / 0. Because rank(fi) = d— 1, det(12nf222 ■ 
^21^12) = 0. Note that fi 2 2 is nondegenerate; it can be easily shown that 
fin = fi^fi^ 1 ^!- Combining this with the following fact: 



J T ftJ = (a 



On fl 12 \ f a 7 

O21 O22 y V id- 



we can get that JfiJ is nondegenerate. In this situation, its inverse 
(J T S7J)+ is just the ordinary inverse (jFOJ) -1 . Then J(J T fJJ)+J T = 
{J(J T riJ)^ 1 /2}{(jT x j7j)-i/2jT} 5 a f u n- ra nk decomposition. Then 

{j(j T nj)+j T }+ = {j{j T njy^ 2 } 

x {(j T nj)- 1 / 2 j T j(j T nj)- 1 j T j(j T nj)- 1 / 2 }- 1 
x {(j T njy 1 / 2 3 T } 

= J{3 T J) 1 J T fl3(J T 3) 1 3 T 

= n. 

This means that J(J T ftJ)+J T = ft+ . 



When det(fi 22 ) = 0, we can obtain that 

I 1/^11 + ^12^22 .1^2l/Oll — ^12^22. i/^ll 



"^22 1 ^21 /^ll J~2~^ 



°22.1 



with J~i 22 .i = ^22 — $121^12/^11 • Write J(J T f2J) + J T as 
'a T (J T ftJ)+cx a T (J T f2J)+' 
(J T f2J)+cx (J T ftJ)+ 

Note that J T ftJ = « 2 2.i + (^i/V^hi + V^ii")(*W\/nn" + \/Tl^a T ), 
so J T f2J > ^22.1- Combining this with rank(f2 22 ) = d — 2, we have that 
(j T nj)+ < n+ 2 l . it is easy to check that a T fi 22 .i = 0, so a _L span(fi 22 .i) 
and a T Q22A a = 0, and then ct T (J T f2J) + = 0. In this situation, 
J(J T f2J) + J < f2 + and the stick less-than sign holds since J T f2J 7^ f&22.i 
andl/rin>0. □ 

A. 5. Proof of The orem 2.2. U nder Hq, we can rewrite the index vec- 
tor as j3= [e B] T ( V / 1 - fjcJC 1 ) || 2 ,u;W r ) T where e = (1,0,..., 0) T is an r- 
dimensional vector, 

B -U, i 
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is an r x (d — 1) matrix and u>W = (/?2, • ■ • , (3 r ) T is an (r — 1) x 1 vector. 
Let u> = (a/1 — ||u;( 1 )|| 2 ,u/ 1 ) T ) T . So under iifo the estimator is also the local 
maximizer u> of the problem 

Q([e B] T £) = sup Q([e B] T u;). 

||u>M||<l 

Expanding Q(B T a>) at by a Taylor's expansion and noting that dQ((3)/ 
#/3 (1) 1^(1)^(1) =0, then Q( / 3)-Q(B T ^)=T 1 + r 2 + o P (l), where 



a 2 Q(/3) 



(£(i) _ B T £) 



Ti = --(pW -B T £, 
T 2 = i0«-B T c2;) T 

O 

-B T ^) T ^Q(/3)/(9/3( 1 )9/3^)| /3(1)=/ 3 (1) ( j a( 1 ) -B T c2>)} 

Assuming the conditions in Theorem 2.1 and under the null hypothesis Hq, 
it is easy to show that 

Vn(B T £ - B t cj) = -^B T B(J T ftJ)+G(/3) + O p (1). 
\/n 



>n 
x 



Combining this with (A. 6), under the null hypothesis Hq, 

(A.7) = -^=(3 T ftJ) 1 / 2+ {I d ^ 1 - (J T fiJ) 1 / 2 B T B(J T fiJ) 1/2+ } 

~i 

(j T nj) 1 / 2 +G(^) + OP (i). 

Since ^G(/3) = O p (1), ^ffff )t l^d) = -nJ T ftJ + o P (n) and matrix 
Jf2J has eigenvalues uniformly bounded away from and infinity, we have 
11/3(1) _ B T a?W|| = P {n- l l 2 ) and then |T 2 | = Op(l). Combining this and 
(A.7), we have 

Q03) - Q(B T a>) = - B T ^( 1 )) T J T nj(^W - B T ^) 

= ^G T (/3)(J T f2J) 1 / 2+ P(J T fiJ) 1 / 2+ G(/3) 

with P = Lj_i - (J T rjJ) 1/2 B T B(J T f2J) 1/2+ . Here P is idempotent having 
rank d — r, so it can be written as P = S T S where S ia a (d — r) x (d — 1) 
matrix satisfying SS T = Lj_ r . Consequently, 

2{W)-Q(B T £)} = ( v ^S(J T r2J) 1 / 2+ G(/3)) T (v / ^S(J T fiJ) 1 / 2+ G(/3)) 

^X\d-r). 
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SUPPLEMENTARY MATERIAL 

Supplementary materials (DOI: 10.1214/10-AOS871SUPP; .pdf). Com- 
plete proofs of Proposition 1, (2.6) and (2.7). 

REFERENCES 

Andrews, D. W. K. (1987). Conssitency in nonlinear econometric models: A genetic 
uniform law of large numbers. Econometrica 55 1465-1471. MR0923471 

Carroll, R. J., Ruppert, D. and Welsh, A. H. (1998). Local estimating equations. 
J. Amer. Statist. Assoc. 93 214-227. MR1614624 

Carroll, R. J., Fan, J., Gijbels, I. and Wand, M. P. (1997). Generalized partially 
linear single-index models. J. Amer. Statist. Assoc. 92 447-489. MR1467842 

Chang, Z. Q., Xue, L. G. and Zhu, L. X. (2010). On an asymptotically more efficient 
estimation of the single-index model. J. Multivariate Anal. 101 1898-1901. MR2651964 

Cui, X., Hardle, W. and Zhu, L. (2010). Supplementary materials for "The EFM ap- 
proach for single-index models." DOL10.1214/10-AOS871SUPP. 

Fan, J. and Gijbels, I. (1996). Local Polynomial Modeling and Its Applications. Chapman 
& Hall, London. MR1383587 

Fan, J., Heckman, N. E. and Wand, M. P. (1995). Local polynomial kernel regression 
for generalized linear models and quasi-likelihood functions. J. Amer. Statist. Assoc. 
90 141-150. MR1325121 

Fan, J. and Jiang, J. (2007). Nonparametric inference with generalized likelihood ratio 
test. Test 16 409-478. MR2365172 

Hardle, W., Hall, P. and Ichimura, H. (1993). Optimal smoothing in single-index 
models. Ann. Statist. 21 157-178. MR1212171 

Hardle, W. and Mammen, E. (1993). Testing parametric versus nonparametric regres- 
sion. Ann. Statist. 21 1926-1947. MR1245774 

Hardle, W., Mammen, E. and Muller, M. (1998). Testing parametric versus semipara- 
metric modelling in generalized linear models. J. Amer. Statist. Assoc. 93 1461-1474. 
MR1666641 

Hardle, W., Mammen, E. and Proenca, I. (2001). A bootstrap test for single index 
models. Statistics 35 427-452. MR1880174 

Hardle, W. and Stoker, T. M. (1989). Investigating smooth multiple regression by 
method of average derivatives. J. Amer. Statist. Assoc. 84 986-995. MR1 134488 

Heyde, C. C. (1997). Quasi-likelihood and Its Application: A General Approach to Opti- 
mal Parameter Estimation. Springer, New York. MR1461808 

Horowitz, J. L. and Hardle, W. (1996). Direct semiparametric estimation of a single- 
index model with discrete covariates. J. Amer. Statist. Assoc. 91 1632-1640. MR1439104 

Hristache, M., Juditski, A. and Spokoiny, V. (2001). Direct estimation of the index 
coefficients in a single-index model. Ann. Statist. 29 595-623. MR1865333 

Hristache, M., Juditsky, A., Polzehl, J. and Spokoiny, V. (2001). Structure adaptive 
approach for dimension reduction. Ann. Statist. 29 1537-1566. MR1891738 

Huh, J. and Park, B. U. (2002). Likelihood-based local polynomial fitting for single- index 
models. J. Multivariate Anal. 80 302-321. MR1889778 

Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation 
of single-index models. J. Econometrics 58 71-120. MR1230981 



THE EFM APPROACH FOR SINGLE-INDEX MODELS 



31 



Kane, M., Holt, J. and Allen, B. (2004). Results concerning the generalized partially 
linear single-index model. J. Stat. Comput. Simul. 72 897-912. MR2100843 

Kohavi, R. (1996). Scaling up the accuracy of naive-Bayes classifiers: A decision-tree 
hybrid. In Proceedings of the Second International Conference on Knowledge Discovery 
and Data Mining 202-207. AAAI Press, Menlo Park, CA. 

Kong, E., Linton, O. and XlA, Y. (2010). Uniform Bahadur representation for local 
polynomial estimates of M-regression and its application to the additive model. Econo- 
metric Theory 26 1529-1564. MR2684794 

Lin, W. and Kulasekera, K. B. (2007). Identifiability of single-index models and 
additive-index models. Biometrika 94 496-501. MR2380574 

Madalozzo, R. C. (2008). An analysis of income differentials by marital status. Estudos 
Econoicos 38 267-292. 

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Champ- 
man & Hall, London. 

Murray, C. (1997). IQ and economic success. The Public Interest 128 21-35. 

Polzehl, J. and Sperlich, S. (2009). A note on structural adaptive dimension reduction. 
J. Stat. Comput. Simul. 79 805-818. MR2751594 

Powell, J. L., Stock, J. H. and Stoker, T. M. (1989). Semiparametric estimation of 
index coefficients. Econometrica 57 1403-1430. MR1035117 

Wang, H. and XlA, Y. (2008). Sliced regression for dimension reduction. J. Amer. Statist. 
Assoc. 103 811-821. MR2524332 

WANG, J. L., Xue, L. C, Zhu, L. X. and Chong, Y. S. (2010). Estimation for a partial- 
linear single-index model. Ann. Statist. 38 246-274. MR2589322 

XlA, Y. (2006). Asymptotic distributions for two estimators of the single-index model. 
Econometric Theory 22 1112-1137. MR2328530 

XlA, Y., Tong, H., Li, W. K. and Zhu, L. (2002). An adaptive estimation of dimension 
reduction space (with discussions). J. R. Stat. Soc. Ser. B Stat. Methodol. 64 363-410. 
MR1924297 

Yu, Y. and Ruppert, D. (2002). Penalized spline estimation for partially linear single 
index models. J. Amer. Statist. Assoc. 97 1042-1054. MR1951258 

Zhou, J. and He, X. (2008). Dimension reduction based on constrained canonical corre- 
lation and variable filtering. Ann. Statist. 36 1649-1668. MR2435451 

Zhu, L. X. and Xue, L. G. (2006). Empirical likelihood confidence regions in a par- 
tially linear single-index model. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 549-570. 
MR2278341 

Zhu, L. P. and Zhu, L. X. (2009a). Nonconcave penalized inverse regression in single- 
index models with high dimensional predictors. J. Multivariate Anal. 100 862-875. 
MR2498719 

Zhu, L. P. and Zhu, L. X. (2009b). On distribution weighted partial least squares with 
diverging number of highly correlated predictors. J. R. Stat. Soc. Ser. B Stat. Methodol. 
71 525-548. MR2649607 



X. Cui 

School of Mathematics 

and Computational Science 
Sun Yat-sen University 
Guangzhou 

Guangdong Province, 510275 
P.R. China 

E-MAIL: cuixia@mail.sysu.edu.cn 



W. K. Hardle 

CASE-Center for Applied Statistics 

and Economics 
Humboldt-Universitat zu Berlin 
Wirtschaftswissenschaftliche Fakultat 
Spandauer Str. 1 
10178 Berlin 
Germany 

E-mail : haerdle@wiwi .mi-bcrlin . dc 



X. CUI, W. K. HARDLE AND L. ZHU 



L. Zhu 

FSC1207, Fong Shu Chuen Building 
Department of Mathematics 
Hong Kong Baptist University 
Kowloon Tong 
Hong Kong 
P.R. China 

E-MAIL: lzhu@hkbu.edu. hk 



